[% setvar title Internally, data is stored as UTF8 %]
<div id="archive-notice">
    <h3>This file is part of the Perl 6 Archive</h3>
    <p>To see what is currently happening visit <a href="http://www.perl6.org/">http://www.perl6.org/</a></p>
</div>
<div class='pod'>
<a name='TITLE'></a><h1>TITLE</h1>
<p>Internally, data is stored as UTF8</p>
<a name='VERSION'></a><h1>VERSION</h1>
<pre>  Maintainer: Simon Cozens &lt;<a href='mailto:simon@brecon.co.uk'>simon@brecon.co.uk</a>&gt;
  Date: 25 Sep 2000
  Mailing List: <a href='mailto:perl6-internals@perl.org'>perl6-internals@perl.org</a>
  Number: 294
  Version: 1
  Status: Developing</pre>
<a name='ABSTRACT'></a><h1>ABSTRACT</h1>
<p>We need to settle on an internal data format; this RFC proposes that
UTF8 should be that format.</p>
<a name='DESCRIPTION'></a><h1>DESCRIPTION</h1>
<p>Perl 5.6's Unicode support has been hampered by the fact that it was
grafted onto the side of the old string support, and so it tried to
handle both Unicode-encoded and non-Unicode data in the same structures;
this made it an absolute swine to do any manipulation properly on these
strings.</p>
<p>This could all be made a lot easier if we stuck to one single data
format for internal representation, just as most other languages out
there do. If we're going to have decent Unicode support, it naturally
needs to be a UTF. So which one?</p>
<p>UTF32 is just not going to fly. It's too big and bulky. UTF16 is
sensible, but there's probably a lot more legacy ASCII data out there
than anything else, so it makes sense to propose UTF8 as a halfway
house.</p>
<a name='IMPLEMENTATION'></a><h1>IMPLEMENTATION</h1>
<p>We'll need to get data into Unicode, and I have an RFC about that; we
need to handle data internally, and I have an RFC about that. This RFC
merely settles on the fact that we need a single internal data format
for simplicity and that it should be UTF8.</p>
<a name='REFERENCES'></a><h1>REFERENCES</h1>
<p>The Unicode FAQ on UTFs and BOMs: (An excellent introduction to what
UTFs are, what they look like and how they work.)
<a href='http://www.unicode.org/unicode/faq/utf_bom.html' target='_blank'>www.unicode.org</a></p>
<p>RFC 295: Normalisation and <code>unicode::exact</code></p>
<p>RFC ??: When UTF8 leaks out</p>
<p>RFC 300: <code>use unicode::representation</code></p>
<p>RFC 312: Unicode Combinatorix</p>
<p>RFC 296: Getting Data Into Unicode Is Not Our Problem</p>
<p>RFC ??: Unicode Locales</p>
<p>RFC ??: Abstract the Internal String Interaction</p>
</div>
